|
CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having non-spherical shapes and size variances. == Drawbacks of traditional algorithms == The popular K-means clustering algorithm minimizes the sum of squared errors criterion: : Given large differences in sizes or geometries of different clusters, the square error method could split the large clusters to minimize the square error, which is not always correct. Also, with hierarchic clustering algorithms these problems exist as none of the distance measures between clusters () tend to work with different cluster shapes. Also the running time is high when n is large. The problem with the BIRCH algorithm is that once the clusters are generated after step 3, it uses centroids of the clusters and assigns each data point to the cluster with the closest centroid. Using only the centroid to redistribute the data has problems when clusters lack uniform sizes and shapes. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「CURE data clustering algorithm」の詳細全文を読む スポンサード リンク
|